Back to Article
Spatial dependence
Download Notebook

Open In ColabThis notebook examines spatial dependence in nighttime luminosity across 520 Indian districts using Local Indicators of Spatial Association (LISA). We apply Local Moran’s I to both the initial level and the growth rate of luminosity per capita, and visualize the results as cluster maps identifying statistically significant spatial clusters (HH, LL) and outliers (HL, LH).

Setup

In [1]:
# Importing necessary libraries for data analysis and visualization
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
import matplotlib.pyplot as plt
import matplotlib.image as mpimg  # Importing matplotlib image for image plotting
import seaborn as sns

import plotly.express as px
import plotly.graph_objects as go

# Importing libraries for spatial data and visualization
import geopandas as gpd
import folium
from folium import Figure

import contextily as cx

import libpysal
from libpysal  import weights
from libpysal.weights import Queen

# Exploratory Spatial Data Analysis (ESDA) tools
import mapclassify as mc
import esda
from esda.moran import Moran, Moran_Local

# Spatial plotting tools
import splot
from splot.esda import moran_scatterplot, plot_moran, lisa_cluster, plot_local_autocorrelation
from splot.libpysal import plot_spatial_weights
from splot.mapping import vba_choropleth

# Statistical modeling
import statsmodels.api as sm
import statsmodels.formula.api as smf

import warnings
warnings.filterwarnings('ignore')


RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

Data

We load district geometries and nighttime lights variables from the GeoPackage, which already contains the 520-district sample merged with all analysis variables.

In [2]:
# Load data from GitHub
import urllib.request, tempfile, os
url = "https://raw.githubusercontent.com/quarcs-lab/project2025s/master/data/maps/india_2001_520.gpkg"
temp_gpkg = os.path.join(tempfile.gettempdir(), "india_2001_520.gpkg")
urllib.request.urlretrieve(url, temp_gpkg)
gdf = gpd.read_file(temp_gpkg, engine="pyogrio")
In [3]:
# Rename key variables for convenience
gdf = gdf.rename(columns={
    "newdata_light_growth96_10rcr_cap": "growth",
    "newdata_log_light96_rcr_cap": "log_initial",
    "NAME2_": "district",
})

# Convert from string to numeric (GeoPackage stores these as text)
gdf["growth"] = pd.to_numeric(gdf["growth"], errors="coerce")
gdf["log_initial"] = pd.to_numeric(gdf["log_initial"], errors="coerce")

print(f"Districts: {len(gdf)}")
gdf[["district", "growth", "log_initial"]].describe()
Districts: 520
growth log_initial
count 520.000000 520.000000
mean 0.023523 -4.818477
std 0.043073 1.093441
min -0.095267 -8.468901
25% -0.001208 -5.488250
50% 0.022962 -4.549616
75% 0.037141 -3.952151
max 0.282837 -2.182392

The interactive choropleth below maps per capita luminosity growth across all 520 districts. Hover over individual districts to inspect their values.

In [4]:
# Visualize spatial data using the explore() method of a GeoDataFrame
gdf.explore(
    # Specify the column to visualize on the map
    column='growth',
    # Specify the attributes to display in the tooltip when hovering over map features
    tooltip=['district', 'log_initial', 'growth'],
    # Choose the classification scheme for data visualization
    scheme='fisherjenks',
    # Specify the number of classes for classification
    k=5,
    # Choose the colormap for data visualization
    cmap='coolwarm',
    # Specify whether to display a legend
    legend=True,
    # Choose the basemap tiles provider
    tiles='CartoDB positron',
    # Customize the style of the basemap tiles
    style_kwds=dict(color="gray", weight=0.5),
    # Customize the appearance of the legend
    legend_kwds=dict(colorbar=False)
)
Make this Notebook Trusted to load map: File -> Trust Notebook

Spatial weights and lags

We construct a Queen contiguity weights matrix directly from the district geometries and row-normalize it, consistent with the weights used in the main econometric analysis.

We reproject the geometries to an India-specific CRS (EPSG:7755) for proper basemap overlay, then compute the spatial lag of each variable—the weighted average of neighboring districts’ values.

In [5]:
# Generate and export weight matrix
W = weights.KNN.from_dataframe(gdf, k=6, ids = 'Census_no')
W_matrix, ids = W.full()
df_W_matrix = pd.DataFrame(W_matrix)
df_ids = pd.DataFrame({'polyID': ids})
try:
    df_W_matrix.to_csv('../data/W_matrix.csv', index = False)
    df_W_matrix.to_stata('../data/W_matrix.dta', write_index = False)
except OSError:
    pass  # Skip file export when running in Colab
print(f"Weights: {W.n} observations, mean neighbors = {W.mean_neighbors:.1f}")
Weights: 520 observations, mean neighbors = 6.0
In [6]:
#  Row-standardize W
W.transform = 'r'
In [7]:
# Plot the spatial weights matrix
# This will visualize the spatial relationships between observations defined by the weights matrix W
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(14,10))
plot_spatial_weights(W, gdf, indexed_on='Census_no', ax=ax)
cx.add_basemap(ax, crs=gdf.crs.to_string(), source=cx.providers.CartoDB.Positron, attribution=False)
cx.add_basemap(ax, crs=gdf.crs.to_string(), source=cx.providers.CartoDB.PositronOnlyLabels, attribution=False)
ax.set_axis_off()
plt.show()
Figure 1: Spatial connectivity structure based on six nearest neighbors
Note: See Spatial dependence notebook for source code.
Source: Data from Chanda and Kabiraj (2000).
In [8]:
# Calculate spatial lag of INDICATOR1 using the specified weights
gdf['Wlog_initial'] = weights.lag_spatial(W, gdf['log_initial'])
gdf['Wgrowth']      = weights.lag_spatial(W, gdf['growth'])

Spatial dependence for initial (log) luminosity

Global Moran’s I summarizes the overall degree of spatial autocorrelation across all districts. Values near +1 indicate strong positive clustering; values near 0 indicate spatial randomness.

In [9]:
# Create a scatter plot using Plotly Express
px.scatter(
    gdf,
    x='log_initial',                               # Data for the x-axis
    y='Wlog_initial',                              # Data for the y-axis
    hover_name='district',                  # Display district name in hover tooltip
    hover_data=['district', 'log_initial', 'Wlog_initial'],    # Additional data to display in hover tooltip
    trendline="ols",                        # Add an ordinary least squares (OLS) trendline
    marginal_x="box",                       # Display marginal box plot on the x-axis
    marginal_y="box"                       # Display marginal box plot on the y-axis
)
Unable to display output for mime type(s): application/vnd.plotly.v1+json
In [10]:
# Compute Global Moran's I statistic for the 'log_initial' variable using the spatial weights matrix W
globalMoran = Moran(gdf['log_initial'], W)

# Format Moran's I statistic to two decimal places
moranI1 = "{:.2f}".format(globalMoran.I)

# Print Moran's I statistic
moranI1
'0.73'

While Global Moran’s I captures the overall pattern, Local Moran’s I (LISA) identifies where significant clusters and outliers are located. Each district is classified into one of four quadrants: High-High (HH, red), Low-Low (LL, blue), High-Low (HL, orange), or Low-High (LH, light blue). The Moran scatterplot shows all districts by quadrant, while the cluster map reveals their geographic distribution.

In [11]:
print(globalMoran.p_sim)
0.001
In [12]:
# Calculate Local Moran's I statistics
moranLocal = Moran_Local(gdf['log_initial'], W, permutations=999, seed=12345)
In [13]:
# Initialize the subplots
f, ax = plt.subplots(1, 2, figsize=(14, 7))

# 1. Plot Moran Scatterplot and customize labels
moran_scatterplot(moranLocal, p=0.05, zstandard=False, aspect_equal=f, ax=ax[0])
ax[0].set_title(f"(a) Moran scatterplot (Moran's I = {moranI1})", fontsize=14)
ax[0].set_xlabel("Log of luminosity per capita 1996", fontsize=12)
ax[0].set_ylabel("Log of luminosity per capita 1996 in neighboring regions", fontsize=12)

# 2. Plot LISA Cluster map and customize title
lisa_cluster(moranLocal, gdf, p=0.05, 
             legend_kwds={'bbox_to_anchor':(1.05, 1), 'loc': 'upper left'}, 
             ax=ax[1])
ax[1].set_title("(b) LISA Cluster Map (p < 0.05)", fontsize=14)

# 3. Add basemap to the cluster map
cx.add_basemap(ax[1], 
               crs=gdf.crs.to_string(), 
               source=cx.providers.CartoDB.Positron, 
               attribution=False)

# Optional: Remove axes ticks for the map to make it cleaner
ax[1].set_axis_off()

plt.tight_layout()
try:
    plt.savefig("../figures/lisaMAP1.png", dpi=150, bbox_inches='tight')
except OSError:
    pass  # Skip file export when running in Colab
plt.show()
Figure 2: Spatial dependence in the initial level of luminosity
Note: See Regional convergence notebook for source code.
Source: Data from Chanda and Kabiraj (2000).

Spatial dependence for luminosity growth

Global Moran’s I for growth confirms significant positive spatial autocorrelation. We proceed to the local analysis.

In [14]:
# Create a scatter plot using Plotly Express
px.scatter(
    gdf,
    x='growth',                               # Data for the x-axis
    y='Wgrowth',                              # Data for the y-axis
    hover_name='district',                  # Display district name in hover tooltip
    hover_data=['district', 'growth', 'Wgrowth'],    # Additional data to display in hover tooltip
    trendline="ols",                        # Add an ordinary least squares (OLS) trendline
    marginal_x="box",                       # Display marginal box plot on the x-axis
    marginal_y="box"                       # Display marginal box plot on the y-axis
)
Unable to display output for mime type(s): application/vnd.plotly.v1+json
In [15]:
# Compute Global Moran's I statistic for the 'growth' variable using the spatial weights matrix W
globalMoran2 = Moran(gdf['growth'], W)

# Format Moran's I statistic to two decimal places
moranI2 = "{:.2f}".format(globalMoran2.I)

# Print Moran's I statistic
moranI2
'0.60'
In [16]:
print(globalMoran2.p_sim)
0.001

The LISA cluster map for growth reveals the geographic pattern of convergence dynamics: HH clusters mark regions where both the district and its neighbors grew fast, while LL clusters identify persistently slow-growing areas. Together with the initial luminosity results above, these local patterns support the use of spatial econometric models to account for spatial spillovers in the convergence process.

In [17]:
# Calculate Local Moran's I statistics
moranLocal2 = Moran_Local(gdf['growth'], W, permutations=999, seed=12345)
In [18]:
# Initialize the subplots
f, ax = plt.subplots(1, 2, figsize=(14, 7))

# 1. Plot Moran Scatterplot and customize labels
moran_scatterplot(moranLocal2, p=0.05, zstandard=False, aspect_equal=f, ax=ax[0])
ax[0].set_title(f"(a) Moran scatterplot (Moran's I = {moranI2})", fontsize=14)
ax[0].set_xlabel("Growth luminosity per capita 1990-2010", fontsize=12)
ax[0].set_ylabel("Growth luminosity per capita 1990-2010 in neighboring regions", fontsize=12)

# 2. Plot LISA Cluster map and customize title
lisa_cluster(moranLocal2, gdf, p=0.05, 
             legend_kwds={'bbox_to_anchor':(1.05, 1), 'loc': 'upper left'}, 
             ax=ax[1])
ax[1].set_title("(b) LISA Cluster Map (p < 0.05)", fontsize=14)

# 3. Add basemap to the cluster map
cx.add_basemap(ax[1], 
               crs=gdf.crs.to_string(), 
               source=cx.providers.CartoDB.Positron, 
               attribution=False)

# Optional: Remove axes ticks for the map to make it cleaner
ax[1].set_axis_off()

plt.tight_layout()
try:
    plt.savefig("../figures/lisaMAP2.png", dpi=150, bbox_inches='tight')
except OSError:
    pass  # Skip file export when running in Colab
plt.show()
Figure 3: Spatial dependence in the growth rate of luminosity
Note: See Regional convergence notebook for source code.
Source: Data from Chanda and Kabiraj (2000).